Disentangling diagnostic object properties for human scene categorization

It usually only takes a single glance to categorize our environment into different scene categories (e.g. a kitchen or a highway). Object information has been suggested to play a crucial role in this process, and some proposals even claim that the recognition of a single object can be sufficient to categorize the scene around it. Here, we tested this claim in four behavioural experiments by having participants categorize real-world scene photographs that were reduced to a single, cut-out object. We show that single objects can indeed be sufficient for correct scene categorization and that scene category information can be extracted within 50 ms of object presentation. Furthermore, we identified object frequency and specificity for the target scene category as the most important object properties for human scene categorization. Interestingly, despite the statistical definition of specificity and frequency, human ratings of these properties were better predictors of scene categorization behaviour than more objective statistics derived from databases of labelled real-world images. Taken together, our findings support a central role of object information during human scene categorization, showing that single objects can be indicative of a scene category if they are assumed to frequently and exclusively occur in a certain environment.


Experiment 1: Instructions for diagnosticity and anchorness ratings
The following instructions (original in German) were presented before the rating blocks and, as a reminder, at the beginning of each block in Experiment 1: First, please indicate how diagnostic the presented object is for the indicated scene category. Diagnostic here means how unambiguously one can infer the indicated scene category from the presented object. A beach chair, for example, is very diagnostic for the category beach because beach chairs usually only occur in beach scenes. One can therefore normally "make the diagnosis" beach based on the object beach chair. In contrast, a towel is not very diagnostic for the category beach because towels also occur, for example, in bathroom and swimming pool scenes.
Second, please indicate whether the presented object is an anchor object for other objects. Anchor objects are objects that can provide information regarding the presence and position of other related objects. Anchor objects are usually large and stationary whereas the related objects are usually small and movable. For example, a sink is an anchor object for the soap because the soap can usually be found on top of the sink. Similarly, a shower is an anchor object for shampoo because shampoo can usually be found inside the shower. On the other hand, a football is not an anchor object because it is movable and does not provide reliable information regarding the presence and position of other objects around it. Neither is a fork, which usually occurs together with a knife, but which is easily movable in space.

Observed (a) superordinate-level and (b) basic-level scene categorization accuracy in
Experiment 2 as a function of object size on screen and object recognition.
Note. Points represent data averaged across participants for each individual image. The dashed lines indicate the expected chance level in the superordinate-level (50%) and basiclevel scene categorization task (6.25%).

Supplementary Figure S3
Correlations between different measures of object specificity and frequency in Experiment 2.
Note. SPC = Specificity, FRQ = Frequency, ADE = ADE20K dataset, ADE subset = ADE reduced to the 16 categories (and their synonyms) used in the experiment. Frequency measures for ADE subset and ADE are identical. See text for details.

Supplementary Figure S4
Observed basic-level scene categorization accuracy in Experiment 2 and 3 as a function of (a) object size on screen and (b) eccentricity.
Note. Points represent mean accuracy (% correct) across participants for each individual image.

Supplementary Figure S5
Observed change in the usage rating and basic-level scene categorization accuracy from Experiment 2 to 3 as a function of the object size on screen (before resizing in Experiment 3) for (a) indoor and (b) outdoor scenes.

Supplementary Table S1
Results of models predicting scene-categorization accuracy in Experiment 1.  Note. The covariates object size and eccentricity were standardized.

Supplementary Table S2
Results of models derived from the Lasso procedure for basic-level scene categorization Note. Displayed are fixed effects of the full models including random slopes for all predictors.
The covariates object size and eccentricity were standardized. See text for details.

Supplementary Table S3
Results of the optimal models derived from the Lasso procedure predicting basic-level scene categorization accuracy in Experiment 2 for subsets of indoor and outdoor scenes. Note. Displayed are fixed effects of the full models including random slopes for all predictors.
The covariates object size and eccentricity were standardized. Movability was inverted (higher values indicating stationary objects).